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Abstract 

It is shown how traditional development of theories of fluids based upon the 
concept of physical clustering can be adapted to an alternative local clus- 
tering definition. The alternative clustering definition can preserve a detailed 
valence description of the interactions between a solution species and its near- 
neighbors, i.e., cooperativity and saturation of coordination for strong asso- 
ciation. These clusters remain finite even for condensed phases. The simplest 
theory to which these developments lead is analogous to quasi-chemical the- 
ories of cooperative phenomena. The present quasi-chemical theories require 
additional consideration of packing issues because they don't impose lattice 
discretizations on the continuous problem. These quasi-chemical theories do 
not require pair decomposable interaction potential energy models. Since cal- 
culations may be required only for moderately sized clusters, we suggest that 
these quasi-chemical theories could be implemented with computational tools 
of current molecular electronic structure theory. This can avoid an interme- 
diate step of approximate force field generation. 



1. Introduction 



Recent molecular calculations Jl],0] have suggested in new ways that a chemical perspec- 
tive can be helpful in computing thermodynamic properties of water and aqueous solutions. 
This paper follows-up those observations and develops quasi-chemical theories for the ther- 
modynamics of associated liquids. 

The most direct antecedent of this effort was the recent calculation of the absolute 
hydration free energy of the ferric ion (Fe 3+ ) 0. That calculation used electronic structure 
results on the Fe(H20)6 3+ cluster and a simple, physical estimate of further solvation effects. 
Those results were organized according to the pattern of a simple chemical reaction and 
a surprisingly accurate evaluation of the hydration free energy was obtained. A second 
important antecedent of this work was the recent calculation of the free energy contribution 
due to electrostatic interactions in liquid water QTJ] . That work systematically exploited the 
observed distribution of close neighbors of a water molecule, the distribution of clusters, 
in liquid water in order to obtain an accurate, simple treatment of the thermodynamics of 
electrostatic interactions in water. The work below is a basic theoretical investigation of the 
success of these recent calculations. 

The developments below are based upon formal theories of association equilibrium that 
have a long history [§-^j. Formal developments of that kind have been directed in several 
different ways. One is the study of mechanisms of condensation pHTDl . A second is towards 



development of theories of molecular liquids in which the molecular species of interest are 
formed by combination of atoms |IT| , [12| . A third way is towards a theory of associated 
liquids, like liquid water [0-|23|. The latter two goals often overlap and the formal theory 



is interesting more broadly [23]. 

For liquid water, the idea of distinguishable association species is a firmly entrenched 
historical precedent [p4jp5|. These ideas are called "mixture models" [^6H 30[|. Though the 
mixture models are common and intuitive ideas, they have never been developed to a satis- 
factory theoretical conclusion. The available computer simulation data has always suggested 
that the molecular description of liquid water is more subtle than is typically imagined when 
mixture models are discussed H3~I| . It should be noted that the search continues for structural 
species of special significance in the computer simulation of aqueous solutions f3lH37lh but 
the theoretical connection of such structures to the experimental thermodynamics requires 
further elaboration. One goal of this work is to clarify the foundations of those ideas and 
efforts. 

The theory developed below is akin to good approximations of historical and pedagogi- 



cal importance in the areas of cooperative phenomena and phase transitions | 3B|| . In those 
areas, similar approximations are called Guggenheim, quasi-chemical, or Bethe approxima- 
tions. However, those treatments typically have been developed for lattice gases, utilizing 
specialized considerations appropriate for those specialized settings. Our work here empha- 
sizes application to fluids, without initial lattice discretizations, and utilizing modern tools of 
computational chemistry. As one example, some packing issues that are typically preempted 
by lattice gas assumptions must be addressed explicitly here. Thus, these derivations and 
the principal results of them have not been given before. 

As a final introductory point, we note the applications of quasi-chemical approximations 
in treating lattice models of water [BUJ-EBJ. Those efforts may have received less attention 



than they deserve because of their lack of conventional molecular realism. Indeed, such 
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calculations sometimes must make arbitrary, prior decisions that may preclude answers to 
subsequent questions. For example, if pentagonal H-bond rings, or some other specific 
structure pI] , ffE| )ff7|| , were crucial for a particular phenomenon, that issue might have to be 
specifically anticipated and accomodated in typical lattice gas treatments. However, the 
general success of these approaches should teach us something about how to formulate less 
restricted theories of liquid water and learning those lessons is also one of the goals of this 
paper. 



2. Theory 

We consider two different clustering concepts. The first clustering concept is the more 
standard ||||, more global, and more ambitious. Let lower case Greek letters identify 
basic chemical components. For each component pair aj, provide a geometric criterion 
that determines whether a particular pair of particles are clustered. Clusters are then 
identified in a many-body system by the rule that any pair of particles that satisfy the pair 
clustering criterion are members of the same cluster. Despite the simplicity of this definition, 
it holds a fundamental difficulty for theories of liquids: for intuitive clustering criteria, dense 
liquids are typically past the percolation threshold. The cluster size distribution will include 
large clusters that have to be directly considered. 

The second clustering concept was foreshadowed by the calculations of Hummer, et al. |J 
and is more local. Focus attention on only one particle. Again consider a definite geometric 
clustering criterion for all pairs of species types. Then the clusters are only those involving 
the distinguished particle as the central element, or nucleus. These are star- type clusters 
nucleated on the central element. For example, if the distinguished particle is of type a, 
the clusters considered are those for which (0,1,. . .) neighbors of the distinguished particle 
are within the geometric 07 clustering criterion for all 7. The size of these clusters will be 
limited by the maximum coordination number of the distinguished central particle. This 
will be a practical advantage. But there is the corresponding disadvantage: in the cases that 
particular extended clusters are expected on a physical grounds to be especially significant, 
those extended clusters may not have a direct role to play in this theory. 

Theories developed for these different clustering concepts will eventually diverge from 
each other. But they can be characterized by equations of similar form in which cluster 
properties play a decisive role. The derivation of these equations is our goal. 



2.1. Preliminaries 

Here we make some preliminary comments that serve to simplify the subsequent derivations 
and present some of the notation used. A central feature of this development is the potential 



distribution theorem |57j . For an atomic solute with no internal degrees of freedom this may 
be expressed as: 

P.K = (e- AU/RT )o z v . (1) 

z v = e^"' RT is the absolute activity of the v particle and A u is its thermal deBroglie wave- 
length. The subscripted brackets indicate the thermal average in the absence of interactions 
between the solvent and the solute (test particle). Here the averaged property is the Boltz- 
mann factor of the mechanical potential energy of interaction between solvent and solute. 
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An equivalent description of this bath factor is that (e~ AU / RT )o is the average of the Boltz- 
mann factor of the solute-solvent interactions over the thermal motion of the solute and 
solvent under the condition of no interactions between these two systems. [We note that 
these results are not limited to pairwise decomposable interactions; the quantity AU is the 
difference between the potential energy of the composite system and that of the separate 
non-interacting systems.] Permitting the possibility of internal degrees of freedom including 
orientational degrees of freedom, the required generalization is [fT? ]: 



PvV/q v = (e- AU / RT ) z v . (2) 

q v = q(N u = 1,V,T) is the canonical partition function for the system of one molecule of 
type v in volume V at temperature T. 

Fundamental results below that are central to our derivation can be viewed as formally 
exact generalizations of this potential distribution expression for the case that molecular 
clusters form. For those purposes, we will require some elaborations of notation. We suppose 
that a geometric criterion has been given by which a cluster of type M is recognized and that 
this criterion is expressed by an indicator function H M ; H M = 1 when a cluster of type M is 
formed and zero when it is not. An "M-clustered" configuration is one for which Hm = 1- 
The results below will involve the canonical partition function, qu for a cluster of type M; 
this is understood to be the partition function of the particles that compose a cluster of type 
M over the region Hm = 1- Suppose that an M-clustered configuration is given and consider 
placements of particles other than those that are M-clustered. Not all configurations of 
these additional particles can be permitted without contradiction of the specification that a 
particular M-cluster is present. A further extension of this notation will use H^\m = 1 to 
indicate that region wherein the N-M other particles in the N-body system are outside the 
clustering condition for an M-cluster. H N \ M = for positions of those additional species that 
are not permitted under the condition that the initially specified particles are M-clustered. 
We then consider bath factors denoted by {e~ AU l RT H n \m)o,m- This will indicate the average 
over the thermal motion of the M-cluster and the solvent under condition of no interaction 
between them. The averaged quantity involves the exclusion factor H N \ M in addition to the 
familiar Boltzmann factor of the solute-solvent interactions. This essential exclusion factor 
then assigns the value zero as the weight for those configurations for which the solvent 
penetrates the M-clustering volume. 



2.2. Global Clustering Definition 

In order to involve information on clusters, we express the density of interest in terms of 
cluster concentrations pM- Thus, for the density of the a particles, we would write 

Pa = Yl U olMPM (3) 
M 

where M identifies a molecular cluster considered, n a M is the number of a species in that 
cluster, and the sum is over all molecular clusters that can form. 
The cluster concentrations pM are obtained from 

Pu = (q M /V) (e- Au ^H N[M ) , M II • (4) 

7 
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<1m — Qm(T,V) is the canonical partition function covering configurations of an M-cluster 
|6|j9| JTl|JT^] . The indicated average utilizes the thermal distribution of cluster and solvent 
under the conditions that there is no interaction between them. AU is the potential energy 
of interaction between the cluster and the solvent. 

Eq. [| can be derived by considering the grand ensemble. The average number < Nm > 
of such clusters is composed as 

E(z,T,V)<N M > =n^ CTM (5) 
x £ Q(N,V,T\n M )l[( N Az^«. 

N>n M 7 \ n lM/ 

Here S(z, T, V) is the grand canonical partition function; iV 7 is the total number of 7 particles 
in the system; N is the set of particle numbers {N 7 • • •} and similarly is the set of particle 
numbers for the M-cluster, {n 7 M, ■ ■ ■}; Q(N, V, T\to.m) is the canonical ensemble partition 
function with the constraint that m\m specific particles are clustered. This constraint means 
that the general integrand range has been partitioned and this integration is weighted by 
HmHn\m for specific particles clustered. The binomial coefficient (J^) is the number 
of n 7 ^/-tuples of 7 particles that can be selected from iV 7 particles. Because of the particle 
number factors in the summand the partition function there can also be considered to be 
the partition function for N-n^ particles but with an extra, distinguished 11^ objects that 
constitute the cluster of interest. A natural distribution of those n M extraneous objects is the 
distribution they would have in an ideal gas phase; the Boltzmann factor for that distribution 
appears already in the integrand of the Q(N, V, T\xim) and the normalizing denominator for 
that distribution is Qm(T) YI^^m- The acquired factorials cancel the denominators of the 
binomial coefficients. The remaining fragments of the binomial coefficients merely adjust 
the factorials involved in the definition of Q(N,V,T\iim)- Final alignment of the dummy 
summation variables then leads to Eq. |J 

Combining our preceding results, we obtain 

Pa = Y, n Mm/V){e- m l RT H N \ M ) 0M \{z n 1 ^. (6) 

M 7 

This is an equation that might be solved for the absolute activities z=(z a , . . .) in terms of 
the densities, cluster partition functions, and the temperature. If the bath contribution is 
neglected, ( e ~ AU / RT H N \ M ) 0>M —1, this is exactly the relation of Eq. 11 of ||. This is the 
result that was sought. 

Though formally correct, there is a fundamental difficulty with this result. Consider a 
clustering definition in which particles at near-neighbor distances in solution are clustered. 
Then the sum will diverge as a percolation threshold is approached by increasing the den- 
sity of a dilute phase. This is true whether or not cluster interference is neglected. This 
divergence is sometimes taken as a practical indication of condensation at low temperatures. 
However, the sum will diverge at similar densities even if no condensation occurs. Thus, this 
formula is inapplicable to a liquid, traditionally defined, without further considerations. 

Notice however that there is a non-trivial special case for which Eq. ^| may be directly 
applied to a condensed phase and is likely to be helpful. This is the case where species a is a 
dilute solute and the interest is in the effect of the solvent on \i a . Then we may adopt such 
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a restricted definition that no solvent-solvent clusters can form. However, at the same time 
we can define the solute-solvent clustering criteria more physically and study those clusters 
in which solvent molecules bind to the solute of interest. Those clusters will be finite and 
the sum of Eq. ^ will involve only a finite number of terms. 

2.3. Local Clustering Definition 

We will derive the result needed through an indirect argument that utilizes the already 
derived Eq. ||. We wish to consider non-dilute systems and species for which Eq. || does 
not apply directly. We will find a way to use Eq. || by appropriately distinguishing single 
molecules in this non-dilute phase. 

We begin by noting the well-known fact that the chemical potential, say fi a , can 
be divided into ideal and interaction parts. The ideal contribution takes the form 
RT In p a + constant where the constant might be calculated on the basis of molecular prop- 
erties; see Eq. ([I]). The first step in our argument is the specification that the theory need 
only determine the interaction part of the chemical potential since the ideal contribution is 
well-known. To determine the interaction part of the chemical potential we distinguish a 
single molecule of type a and study its condition in solution. This is natural; for example 
a simulation calculation might select a particular a molecule and perform charging or un- 
charging calculations, or determine distributions of binding energies experienced PJ. When 
an a molecule is selected for the purposes of calculation of the interaction part of u a it can 
be treated as a solute at the lowest non-zero concentration, as a solitary impurity. We will 
denote the chemical potential of this distinguished solute as /x Q /, remembering that the inter- 
action part of a a i will be the same as the interaction part of fi a . For a dilute solute, we can 
define clustering criteria, as anticipated above, so that no solvent-solvent clustering occurs 
as defined, but the definition of clustering of solvent molecules about the distinguished a 
solute is naturally included. Eq. ||then does apply to the calculation of u a i. 

The modifications of Eq. |6| for this case are two: First, the stoichiometric coefficients of 
Eq. |3], that appear later in Eq. ||, are all one (1); since the distinguished solute is at the 
lowest non-zero concentration there cannot be more than one such solute in any cluster. 
The right side of Eq. |6| is precisely proportional to z a >. Second, all clusters are of star type, 
that is, AB n with the distinguished solute at the center. 

Before finally writing the desired result, we ask again about the ideal contribution to u a i 
and to u a . This ideal contribution is reflected in the density on the left of Eq. ||. For p a 
that density is the physical value and is part of the definition of the problem. For p a i, if our 
argument were taken literally, that density on the left would be p a i = 1/V. Replacement of 
that value by p a , and on the right simultaneously z a / by z a , would merely readjust u a i up 
to u a through a final assessment of the ideal contribution. Thus, we have 

Pa = £ (<lM/V)(e~ AU / RT H NlM ) 0M l[z;^. (7) 

M(a) 7 

The sum is over all clusters M(a) that can form on an a nucleus. 

A practical example of the importance of the clustering definition may be helpful. Recent 
work on clusters of a chloride ion with water has suggested that the preferred disposition of 
the chloride ion may be near the surface of the cluster . This interesting point is unlikely 
to be decisive for the application of this cluster formula to the study of the solvation of the 
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chloride ion in liquid water. The cluster definitions here require that the chloride be the 
nucleus of a star-type cluster. That would permit the chloride ion to access the physical 
surface of a droplet only for small clusters and larger clusters are likely to be decisive in 
establishing the bulk phase thermodynamics of the aqueous solutions containing chloride 
ions. 



2.4. Cluster Interference 

This paper will develop the local clustering alternative; the global clustering results will 
serve as contrast. The issue of cluster interference is different in the two cases. The develop- 
ment requires a complete and unique partitioning of phase space into regions characterized 
by a specific cluster population. For each proper configuration, the definition uniquely as- 
signs elementary particles to clusters. With this characteristic, we then formally regard the 
cluster populations as supplementary integration (summation) variables, first integrating 
over configurations with a specific constraint of a specific cluster population, then summing 
over permitted cluster populations. Cluster interference is a simple implementation of the 
constraint. If a particular cluster of type M is specified then configurations that violate the 
specification cannot be allowed. As a particular example, suppose that an A n cluster is under 
consideration. Then some configurations for which an additional A particle approaches the 
A n cluster must be excluded; otherwise configurations of A n+1 clusters would become con- 
fused with configurations of A n clusters. In the notation above these constraints are lumped 
into the factors ( e ~ AU / RT 'Hn\m)q,m through rigid exclusion interactions. It is in these bath 
factors and the cluster partition functions that cluster interference is expressed. 

For the global clustering development, these cluster interference contributions are compli- 
cated because they depend on all the cluster sizes and shapes. However, the global clustering 
result Eq. || is fundamentally inapplicable to liquids. 

In contrast, for the local clustering development cluster interference is much simpler. If 
we specify that a cluster of type A n is considered, we must only require that the n-1 particles 
are within the clustering volume v of a distinguished molecule (easy to do) and that no more 
than n-1 additional particles are there. This latter factor is familiar from studies of packing 
problems in liquids |59| and in the potential distribution factor it involves the probability 



that the clustering volume v is empty of solvent species pD| . We can consider the condition 
that the clustering volume is empty of solvent molecules by introducing the probability for 
that event, p . The van der Waals approximation p Q « (1 — pv) should be qualitatively 
satisfactory and provides definiteness to the discussion. Thus, within the local clustering 
approach, cluster interference is completely expressed by the form 

P - = Po E (WV) (e- AU/RT )l M II ' Z T ■ (8) 

Za M(a) 7 

Now the well-decorated term (e~ Au ^ RT )Q M indicates the average over the thermal motion 
of the M-cluster and solvent under that condition that the only interactions between them 
rigidly enforce the exclusion of the solvent from the M-clustered volume. We have factored 
out the z a because this quantity must be present in each term, because the ratio p a /z a is 
a standard form, and because the distinguished a particle that is the nucleus of the cluster 
requires a different treatment than do the particles on the periphery of the star. The notation 



7 



means that the term for the species nucleating the cluster should be stricken from the 
product. Though this result is formally complete, an approximate theory will have to be 
utilized for po in specific applications. 



2.5. Quasi-chemical Approximation 

A theory with quasi-chemical form is 
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This replacement is motivated by the desire to replace the 'bare fields' ln^ with effective 
fields, by the recognition that Eq. || provides a pattern for that replacement, and by the 
appreciation that the bath contributions might reasonably factor for species on the "surface" 
of the cluster. 

Note that the list of clusters M(a) should include the monomer. One term in this list 
for Eq. |8| will have qu = q a - Thus, we can write 



PaV 

z a q a 

where 



po(i+ E 'Km(t)U' p :^), (10) 

M(a) v 



and the sum of Eq. |TD| is over the list of possible star clusters nucleated by an a species but 
not including the monomer cluster. This theory deserves the appellation "quasi-chemical" 
because the coefficients Km(T) are the chemical equilibrium ratios for the formation of star 
clusters in a dilute gas Note, however, that the factor p will be essential for description 
of packing effects and thus for predictions of thermodynamic properties of condensed phases. 
The thermodynamic quantity sought is the chemical potential 

fi a « RT\n[p a V/q a ] - RT\n[p (l + £ 'K M (T) JJ 'p? M )]. (12) 

M(a) " 

This formula makes the conventional, helpful separation between the contributions of inter- 
molecular interactions and the non-interaction (ideal) terms; see Eq. 0. Quantities such as po 
and the K M {T) depend on parameters that define the clustering circumstances. But, since 
the physical problem is independent of those parameters, the theory should be insensitive 
to them if it is to be satisfactorily accurate. 



2.6. Discussion 



The formal results Eqs. [7| and ^ and the approximation Eq. [T^ are believed to be new. The 
approximation Eq. ^| attempts to eliminate the complicated bath contributions. These quan- 
tities are formally well-understood and can be formally analyzed with Mayer mathematical 
cluster expansions or functional analyses. Here we discuss physically what's neglected. For 
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dilute solutions where the solvent activities are known separately, Eq. || may be used di- 
rectly and the issues that follow here are not relevant. This was the pattern of the motivating 
calculations |j],[2|]. 

Eq. |^ assumes that each activity factor may be replaced using the relation Eq. ||] with 
no account of the interference between different sites that is formally expressed in the left- 
side of Eq. p. This might be correct for idealized circumstances, e.g. a "Bethe lattice" 
(no cycles). [An alternative derivation based upon diagrammatic arguments makes it clear 
that this is a tree approximation.] But for more realistic continuous problems there are 
two sources of that interference. Firstly, different peripheral sites on the star cluster can 
interfere with one another. This effect arises because solvent molecules can interact with two 
or more surface sites jointly. The organization of the problem here is designed to mitigate 
that interference between different surface species. Secondly, "through solvent" interference 
between a peripheral site and the nucleus of the cluster arises when solvent molecules can 
interact with a surface surface sites and the nucleus or cluster volume jointly. Eq. [5] neglects 
both of these effects. 

We reiterate that the quantities neglected by the approximation Eq. | are well understood 
formally. Thus, in specific applications it should be practical to augment this approximation 
with additional contributions that are understood physically and theoretically. An exam- 
ple should serve to clarify this point. There has been significant recent interest in primitive 
electrolyte solution models under circumstances where ion pairing and clustering is acknowl- 



edged to be of primary significance [p2[ . The formal developments here apply also to ion 
clustering in electrolyte solutions. However, where non-neutral clusters are important the 
approximation Eq. [| must be at the least augmented to treat long range interactions, likely 
following a random phase approximation. 



A more specific example is given by the study of the hexa-aquo ferric ion, Fe(H 2 0) 



3+ 
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reported in Reference |2j . The data given there allow us to estimate the error of the neglect of 
long-ranged interactions, Eq. [| That neglected contribution would be about 391 kcal/mol, 
or 38% of the value inferred from experiment for interaction part of HFe 3 +- [See Table IV 
of that report but note that the packing contribution here present as po was neglected in 
that previous study] Thus, for ionic solutes in particular, further consideration of coulomb 
ranged interactions will be necessary. Though the physical estimate given for the hexa- 
aquo ferric ion example was natural and surprisingly accurate, the value 391 kcal/mol 
was essentially empirically derived. To do better than that, the present approach must 
be extended to produce the dielectric constant of the liquid in order to assess screening of 
electrostatic interactions. Since the this approach has a conceptual overlap with the Onsager- 
Kirkwood p3] , |5^] development of the theory of dielectrics that subsequent step should be 
natural. We note, however, that the present hybrid approach will improve the treatment 
of negative ions in solution, such as the CT~ ion which was a remaining difficulty for the 
multistate gaussian model [[[]. 

Finally, the hexa-aquo ferric ion example emphasizes that current electronic structure 
software can routinely produce Km{T) in a harmonic approximation. Other recent exam- 
ples include the work of references [|65| -|68||. This encourages us to anticipate that further 
developments will permit implementation of these theories without an intermediate effort to 
obtain approximate force field models. In the near term, this approach should at the least 
offer a direction for improvement of electronic structure calculations on solution species, 
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calculations that might presently rely solely on dielectric continuum models. 
3. Conclusions 

The traditional development of theories of fluids based upon the concept of physical clus- 
tering can be adapted to an alternative local clustering definition. The alternative clustering 
definition can straightforwardly preserve a detailed valence description of the interactions 
between a solution species and its near-neighbors, i.e., cooperativity and saturation of coor- 
dination for strong association. These clusters remain finite even for condensed phases. The 
simplest theory to which these developments lead is analogous to quasi-chemical theories of 
cooperative phenomena. The present quasi-chemical theories require additional considera- 
tion of packing issues because they don't impose lattice discretizations on the continuous 
problem. These quasi-chemical theories do not require pair decomposable interaction poten- 
tial energy models. Since calculations may be required only for moderately sized clusters, 
we anticipated that these quasi-chemical theories could be implemented with computational 
tools of current molecular electronic structure theory. 
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